Evaluation of Automatic Updates of Roget's Thesaurus

نویسندگان

  • Alistair Kennedy
  • Stan Szpakowicz
چکیده

abstract Keywords: lexical resources, Roget's Thesaurus, WordNet, semantic relatedness, synonym selection, pseudo-word-sense disambiguation, analogy Thesauri and similarly organised resources attract increasing interest of Natural Language Processing researchers. Thesauri age fast, so there is a constant need to update their vocabulary. Since a manual update cycle takes considerable time, automated methods are required. This work presents a tuneable method of measuring semantic relatedness, trained on Roget's Thesaurus, which generates lists of terms related to words not yet in the Thesaurus. Using these lists of terms, we experiment with three methods of adding words to the Thesaurus. We add, with high confidence, over 5500 and 9600 new word senses to versions of Roget's Thesaurus from 1911 and 1987 respectively. We evaluate our work both manually and by applying the updated thesauri in three NLP tasks: selection of the best synonym from a set of candidates, pseudo-word-sense disambiguation and SAT-style analogy problems. We find that the newly added words are of high quality. The additions significantly improve the performance of Ro-get's-based methods in these NLP tasks. The performance of our system compares favourably with that of WordNet-based methods. Our methods are general enough to work with different versions of Roget's Thesaurus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparison of WordNet and Roget's Taxonomy for Measuring Semantic Similarity

This paper presents the results of using Roget's International Thesaurus as the taxonomy in a semantic similarity measurement task. Four similarity metrics were taken from the literature and applied to Roget's. The experimental evaluation suggests that the traditional edge counting approach does surprisingly well (a correlation of r=0.88 with a benchmark set of human similarity judgements, with...

متن کامل

Mapping Roget's Thesaurus and WordNet to French

Roget’s Thesaurus and WordNet are very widely used lexical reference works. We describe an automatic mapping procedure that effectively produces French translations of the terms in these two resources. Our approach to the challenging task of disambiguation is based on structural statistics as well as measures of semantic relatedness that are utilized to learn a classification model for associat...

متن کامل

PunFields at SemEval-2017 Task 7: Employing Roget's Thesaurus in Automatic Pun Recognition and Interpretation

The article describes a model of automatic interpretation of English puns, based on Roget’s Thesaurus, and its implementation, PunFields. In a pun, the algorithm discovers two groups of words that belong to two main semantic fields. The fields become a semantic vector based on which an SVM classifier learns to recognize puns. A rule-based model is then applied for recognition of intentionally a...

متن کامل

A Comparison of WordNet and Roget's Taxonomy for Measuring Semantic Similarity

This paper presents the results of using Roget’s International Thesaurus as the taxonomy in a semantic similarity measurement task. Four similarity metrics were taken from the literature and applied to Roget’s. The experimental evaluation suggests that the traditional edge counting approach does surprisingly well (a correlation of r=0.88 with a benchmark set of human similarity judgements, with...

متن کامل

Word Sense Disambiguation in Roget's Thesaurus Using WordNet

We describe a simple method of disambiguating word senses in Roget's Thesaurus using information about the sense of the word in WordNet. We present a few variations on this method, compare their performance and discuss the results. We explain why this type of disambiguation can be useful.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Language Modelling

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2014